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Abstract 

Quantifying the attack ratio of disease is key to epidemiological inference and Public Health planning. 
For multi-serotype pathogens, however, different levels of serotype-specific immunity make it difficult 
to assess the population at risk. In this paper we propose a Bayesian method for estimation of the 
attack ratio of an epidemic and the initial fraction of susceptibles using aggregated incidence data. We 
derive the probability distribution of the effective reproductive number, R t , and use MCMC to obtain 
posterior distributions of the parameters of a single-strain SIR transmission model with time-varying 
force of infection. Our method is showcased in a data set consisting of 18 years of dengue incidence in the 
city of Rio de Janeiro, Brazil. We demonstrate that it is possible to learn about the initial fraction of 
susceptibles and the attack ratio even in the absence of serotype specific data. On the other hand, the 
information provided by this approach is limited, stressing the need for detailed serological surveys to 
characterise the distribution of serotype-specific immunity in the population. 


Introduction 

Dengue is an arthropod-borne febrile disease caused by a flavivirus with four serotypes which causes 
an estimated 50 million infections each year 1 . In humans, immunity against a particular serotype is 
considered permanent after the exposure and cross immunity to there serotypes is considered short lived [2]. 
As a consequence, the proportions of viral serotypes co-circulating at any point in time are strongly 
dependent on previous incidence patterns of the disease, which determine the number of individuals 
susceptible to each serotype at any point in time. 

Dengue transmission is also modulated by environmental conditions, among which, temperature, due 
to its effects on the vector reproduction, stands out as a strong predictor of incidence [3j|4j. In places 
with sufficient seasonal temperature variation, dengue is predominantly a summer disease. So it is fair to 
say that these environmental fluctuations play a key role in determining beginning and end of epidemic 
periods. This climatic influence is exerted mainly through its effects on the force of infection, which 
cannot be taken as constant [5j but rather as a seasonal (oscillating) function of time. The long term 
dynamics of dengue is also modulated by the alternation of virus types in circulation. Demographics also 
plays a role in replenishing the population of susceptibles. 

The attack ratio (AR) of a disease is a measure of morbidity defined as the number of new cases 
divided by the population at risk. For dengue epidemics, it can be difficult to calculate the AR due to 
the lack of knowledge of the population at risk. The population at risk in this case is the number of 
susceptibles to the circulating virus type(s) before a given epidemic. Thus, in order to calculate the attack 
ratio, we need to determine the number of susceptibles to the circulating virus types right before the 
epidemic, which is virtually impossible without regular virological surveys. 

The attack ratio is also influenced by the reproductive number of the disease 6,7], which is closely 
associated with the force of infection. Thus the incorporation of the effective reproduction number, R t , 
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as a function of time, is crucial to an accurate estimation of the AR of seasonal diseases like dengue and 
Influenza. 

Other methods for estimating the number of susceptibles while accommodating time-varying force 
of infection have been proposed before, for measles [8J.9], a disease that shows remarkable seasonality. 
These methods try to reconstruct the entire series of infectious and susceptibles from case data using 
deterministic models and generally work well for measles because there is a one-to-one relationship between 
exposure and immunity, since measles is caused by a single-strain pathogen. Recently, methods in the 
same fashion were developed for dengue when serotype-specific data is available 110 . When such data 
is not available, the series of susceptibles to all possible serotypes, cannot be reconstructed based solely 
on a deterministic transmission model, since the arrival/re-emergence of new serotypes, an intrinsically 
stochastic events, can drastically change the pool of susceptibles, throwing off any sequential estimation 
based on the incidence dynamics. 

In this paper, we propose a new approach to estimate the number (fraction) of susceptibles using a 
simplified model of dengue transmission based on a single-strain Susceptible-Infectious-Removed (SIR) 
model with time-varying infection rate. In order to bypass the limitations of not knowing the serotype- 
specific seroprevalence and the exact behaviour of the force of infection through time, we propose to inform 
the time-varying transmissibility using the Rt series derived from the notification data 11 . We extend 


a Bayesian framework previously used to estimate the number of susceptibles in Influenza epidemics 
in Europe 12 to include time-varying force of infection and derive a probability distribution for R t to 


accommodate uncertainty in the estimates. Then, from the incidence series and the population at risk, we 
calculate the attack ratio for each epidemic. We apply our method to estimate Sq before every major 
dengue epidemic in the city of Rio de Janeiro, Brazil in the last 18 years. 


Methods 

In this section we will start by describing the data and then the method used to estimate the effective 
reproductive number, R t , from the data and obtain its posterior distribution. We then proceed to describe 
the Susceptible-Infectious-Recovered (SIR) model used to represent the aggregated disease incidence and 
how R t can be integrated into the model to allow for time varying force of infection. Next, an approach 
to approximate the posterior distributions of the numbers of susceptible to the main circulating dengue 
viruses for each epidemic is detailed. Finally, we discuss how to estimate the attack ratio of each epidemic 
using the estimated susceptible fraction and the observed incidence. 

Data 

The data used in this paper consists of time series of weekly notified cases of dengue for the city of 
Rio de Janeiro from 1996 to 2014. The cases are notified based only on clinical criteria. Laboratory 
confirmation and serotype information are available only for a very small sample and only on recent years 
(2010-2013). For the parameter estimation procedures incidence was normalized by dividing the number of 
cases reported by the total city population at each year as given by the census (Census Bureau, Brazilian 
Institute of Geography and Statistics, http://www.ibge.gov.br/english/). 

Estimating the effective reproductive number ( R t ) 

In monitoring of infectious diseases, it is important to assess whether the incidence of a particular 
disease is increasing significantly, in order to decide to take preventive measures. The effective reproductive 
number at time t, Rt , can be understood as a real-time estimate of the basic reproductive number (i?o) 
and is defined as the average number of secondary cases per primary case at time t. 
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Let Y t be the number of reported disease cases for a particular time t £ (0, T). Nishiura el al. (2010) [ll] 
extend the theory developed by Stallybrass et al. (1931) [13] and propose to estimate R t as 


Rt = 


Yt+i 

Y t 


1 /r 


( 1 ) 


where n is taken to be the ratio between the length of reporting interval and the mean generation time 
of the disease. Here we are interested in the simpler case n = 1. If Rt is to be used as a decision tool, 
however, one needs to be able to quantify the uncertainty about estimate in equation [l] Here we detail 
how to obtain credibility intervals for R t under the assumption that the counts Y t are Poisson distributed 
for all t. 

We explore the approach of Ederer and Mantel 14], whose objective is to obtain confidence intervals for 
the ratio of two Poisson counts. Let Y t ~ Poisson(\ t ) and Y t+ \ ~ Poisson(\t+i) and define S = Y t + Y t+ \. 
The authors note that by conditioning on the sum S 


Yi+ilS 1 ~ Binomial(S , 9 t ) 

0 = Xt+1 

At + At+i 


Let c a (6 t ) = {6^,9^} be such that Pr(6[ L ^ < 8 t < o [ U ^) = 
{r[ L \r[ U ^} such that Pr(R[ L ^ < R t < R[ U ' > ) = a. Ederer and Mantel (1974) 
construct a 100a% confidence interval for R t by noting that 


( 2 ) 
( 3 ) 

Analogously, define c a (R t ) = 
show that one can 


14 


r[ l) = 


t(L) 


(1 -o[ L) ) ' (i-ero 


and Ri 1 ^ = 


i(U) 




( 4 ) 


Because the transform from 9 to Rt is monotonically increasing, the result holds for confidence and 
credibility intervals alike. 

Many authors have chosen to quantify the uncertainty about 9 following orthodox approaches (see for 
example 15 and 16 ) mainly for simplicity. We choose instead to take a Bayesian approach and use the 


100a% posterior credibility interval for 9 t as c a {0t). If we choose the conjugate beta prior with parameters 
ao and bo for the binomial likelihood in |2]), the posterior distribution for 9 t is 

p(Qt\Yt+i, S) ~ Beta(Y t+1 + ao, Y t + 6 0 ) (5) 

Combining equations 0 and 0 tells us that the induced posterior distribution of Rt is a beta prime 
(or inverted beta) with parameters a\ = Y t+X + ao and b\ = Y t + b 0 [17 . The density of the induced 
distribution is then 


f P {Rt\a 1 ,b 1 ) = 


r(ai + b x ) 


i?“ 1_1 (l + i? t ) -(ai+hl) 


( 6 ) 


r(ai)r(6i) 

Thus, the expectation of R t is a x /{b\ — 1) and its variance is ai(ai + b x — 1)/ ((&i — 2)(&i — l) 2 ). Note 
that this result holds only for n = 1. Sampling from the posterior in ([6]) can be made straightforward by 
first sampling from ([5]) and then applying the transform in ©. Also, one can choose do and b 0 so as to 
elicit meaningful prior distributions for R t . We show how to elicit the prior for R t from specified prior 
mean and variance or coefficient of variation in the Appendix. 

Also, since Rt > 1 indicates sustained transmission, one may be interested in computing the probability 
of this event. This can be easily achieved by integrating |6]) over the appropriate interval. By noting that 

Pr(R t > 1) = 1 - [ fp(r)dr (7) 


= 1 — Pr(9 t < -) 


(8) 


one can compute the desired probability while avoiding dealing with the density in 0 directly. 
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Mathematical modelling 

A Susceptible-Infectious-Removed (SIR) model is proposed to model dengue dynamics. In the 
traditional formulation of the model, transmission is governed by a constant transmission rate ft and 
recovery happens at a rate r. 

For our analysis we chose to let the force of infection vary with time, just as it does in the actual 
epidemics, as seen in the data. So as the epidemic progresses, the effective transmission rate changes and 
is given by 

m = ^ ( 9 ) 

where Rt is the effective reproductive number, estimated as in[l] The complete model with the time-varying 
force of infection is given by the system of ordinary differential equations: 

-msi (io) 

Pit') SI — tI 
tI 

where S + I + R = 1 V f. Of course, this is a rather simplified model, in which, for instance, the vector 
is omitted. The rationale for this simplification is based on the ability of the empirically derived Rt to 
incorporate the effects of the fluctuating vector populations. Also, although there are multiple circulating 
serotypes, our approach can not discriminate between them due to the lack of serotype-specific data. 
Nevertheless, this modelling strategy can still provide some insight into the disease dynamics and allows 
us to estimate the initial fraction of susceptibles So, a key epidemiological parameter. 

Bayesian parameter estimation 

We take a Bayesian approach to the estimation of So- First the incidence time series was divided into 
J = 13 epidemic windows that corresponded to significant raises in incidence and normalized to lie on the 
[0,1] interval. For a given interval j = {f® tart , f® nd } we observe an incidence time series Yj. We are thus 
interested in the posterior distribution 


dS 

dt 

dl 

dt 

dR 

dt 


p(S 0j |Yj) cx i(Yj|Soj, Rt,m,T)n{Soj 


( 11 ) 


The likelihood Z(Yj |•) is assumed to be a Normal distribution with fixed variance a 2 . In this estimation 
procedure we kept Rt fixed at fixed at the posterior mean obtained as described above and fixed 
t = 1/7 days^ 1 . To complete the inference, we need to specify prior distributions for the parameters of 
interest. We place a flat Beta(l, 1) prior on S 0 j V) 

To approximate the posterior in (111 we use Markov chain Monte Carlo techniques implemented in the 
Bayesian inference with Python (BIP) [l2] available at http: //code. google. com/p/bayesian-inf erence/. 
BIP uses a Differential Evolution Adaptive Metropolis (DREAM) [l8] scheme that efficiently samples 
from high-dimensional joint distributions using multiple adaptive chains running in parallel with delayed 
rejection. Also, as the numerical integration routine implemented within BIP needs ft{t) to be available at 
arbitrary values of t, i.e., as continuous function of time because of the variable step size, we used linear 
interpolation to obtain values of Rt for any time point. In this study we used one chain per parameter, i.e, 

3 chains for each run. The chains were run until 5000 samples were obtained after discarding 500 burn-in 
samples. Convergence of the parallel chains was verified at every 100 iterations by the calculation of the 


Gelman-Rubin’s R (potential scale reduction factor), which approaches 1 at convergence 19 
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Calculating the attack ratio 

The attack ratio of an epidemic is defined by the number of infections divided by the size of the 
population at risk. 


A = 


# cases 


Population at risk 


Based on what has been discussed so far, we can rewrite (12) for each epidemic j as 

E Y j 


Aj = 


5 o j 


( 12 ) 


(13) 


where Sqj is the number of susceptibles before each epidemic j, which we estimated before. 

Python and R code to perform all the analyses described above is publicly available at https: 
//github.com/fccoelho/paperLMl 


Results and Discussion 

In this paper we propose a method to bypass the lack of serotype-specific case data by informing the 
time-varying force of infection with the instantaneous reproductive number, R t which we calculate from 
aggregated data. The main contribution of this paper can be summarized in the following items: (i) we 
show a method to quantify uncertainty about Rt that is readily applicable to other diseases and; (ii) we 
propose to use Rt to inform a dynamic epidemic model with time-varying force of infection in order to gain 
insight into the attack ratio of each epidemic; (iii) We propose an estimation procedure for circulating 
serotype’s So from aggregate case data, which is robust to epidemic sizes; We estimate the AR for 18 
years of Dengue epidemics. 

Figure [T] shows the Rt series, according to (JTJ) [ll] along with the confidence bands derived in this 
paper. It can be seen that the inter-epidemic periods are characterized by Rt being indistinguishable from 
1. Due to the intrinsic variability of the R t series, the examination of its credible intervals is essential to 
identify periods of sustained transmission. The wider intervals between epidemics are due to the scarcity of 
cases during these periods. The method to quantify uncertainty proposed here provides more conservative 
credibility intervals, and therefore offers protection against false alarms. 


A key epidemiological quantity is the attack ratio (AR) of an epidemic, a measure of morbidity and 
speed of spread which can be used to predict epidemic size and help efficient Public Health planning. The 
AR depends fundamentally on the population at risk, which in the case of dengue is every naive (to a 
particular serotype) individual in the population. Estimating the initial susceptible fraction Sq for each 
epidemic is thus central to the estimation of the AR. Methods for estimating the number of susceptibles 
have been proposed before, for other diseases [8}[9]. These methods try to reconstruct the entire series 
of infectious and susceptibles for measles outbreaks from case data. In the case of dengue, the full 
(multi-year/multi-epidemic) series of susceptibles to all possible serotypes, cannot be reconstructed based 
solely on a deterministic transmission model, since the arrival/re-emergence of new serotypes (which are a 
stochastic events) can change drastically the pool of susceptibles throwing off any sequential estimation 
based on the incidence dynamics. 

Since there is very limited information regarding the actual proportions of each virus in circulation 
and most information available is about the predominant serotypes for some epidemics in the period of 
study only 20j, we propose the use of a simplified a single strain model. The main argument we put 


forward is that by conditioning on the Rt series, we implicitly take into account the variability introduced 
by the co-circulation of multiple serotypes and heterogeneous levels of immunity in the general population. 
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Figure 1. Estimated time-series for f? t , along with 95% credible intervals. Top panel show 
reported cases from which R t is estimated. As expected, uncertainty about R t is greater when the case 
counts are low, for instance in the period 2003-2006, which represented a big hiatus between major 
epidemics. The intrinsic variability of Rt can be used to inform the time-varying force of infection, since it 
reflects variation in the vector population and other environmental factors such as temperature and 
seasonal variation. 


We sought to deal with all important sources of uncertainty impinging on the estimation of the AR of a 
dengue epidemic, but not all could be satisfactorily addressed in this analysis. For instance, in any given 
epidemic there is a large number of mild and asymptomatic cases, which nevertheless acquire immunity. 
It is estimated that for every case reported, up to 10-20 are not seen by health authorities 21 . Another 


source of uncertainty is under-reporting of diagnosed cases, which is a serious issue in the health care 
systems of many developing countries such as Brazil. Duarte and Franga (2006) {22], estimated the 
sensitivity of Dengue reporting for hospitalized patients in Belo-Horizonte, Brazil to be of 63%, meaning 
that approximately 37% of the suspected Dengue cases go unreported. Lastly, demography and migrations 
affect the number of susceptible in ways which are not easy to fully determine. 

Figure [3] shows the model from (10) fitted to the data. Despite its limitations, our simplified model 
fits the data well. In it we can see that the susceptibles series in each epidemic starts at the estimated 
level of Sq- The proportion of susceptibles may seem low, but we must remember that these estimates 
are being affected by an unknown under-reporting factor, which experts suggest is somewhere between 
5 and 10, i.e. for every case observed there are 5 or 10 unobserved. Since this under-reporting affects 
both the numerator and denominator of (13), its effects should cancel out, giving us an unbiased attack 
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(c) 2012 (d) 2013 


Figure 2. Maps showing the incidence of dengue in the city of Rio de Janeiro from 2010 
to 2013. Circles indicate individual notified cases. A heatmap is overlayed on the maps showing absolute 
density of cases. It can be seen that several areas of the city were affected and no region seems to be free 
of transmission risk. This suggests that although transmission risk varies spatially, there is significant 
exposure over the entire city. 


ratio estimate. One other possible source of bias which would lead to the underestimation of So could 
come from a significant part of the population not being exposed to the disease. However, as we can see 
in Figure [2j despite the differences in intensity (incidence), the entire city seems to be at risk, with no 
particularly “protected” areas, at least in the last four epidemics. 


Table [l] contains the attack ratios and medians of the So estimated for each epidemic/outbreak. It 
is interesting to notice that the larger epidemics, in terms of peak size are not the one with the greater 
attack ratios. This stresses the importance of knowing the immunological structure of the population. 
Knowing the So for the circulating viruses we can order to more accurately assess the potential impact of 
a coming epidemic, since particularly virulent types, can be rendered less of a threat by a low So- 

We hope that the results presented in paper will motivate public health authorities to invest in annual 
serological surveys, to determine the susceptibility profile to each dengue virus as well as to estimate the 
under-reporting factor of the notification system. 
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Figure 3. Susceptibles and Infectious posterior curves. The curves were estimated only for the 
periods where Rt > 1. The susceptible curves in the top panel reflect the prevalence of fraction of 
susceptibles to circulating strain(s) for each epidemic/outbreak. In the lower panel, we see the posterior 
distribution of infectious curves, represented by its median and 95% credible interval. Credible intervals 
are very narrow, and can be hard to distinguish from the median line. Dots show the observed cases, 
scaled as fractions of the entire population. 
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Table 1. Median attack ratio and 95% credibility intervals calculated according to (13). 

Values are presented as percentage of total population. Year corresponds to the start of the epidemic, 
however the peak of cases may occur in the following year. *: Susceptible fraction. These results show 
considerable variation in AR between epidemics, consistent with the accquiring and loss of 
serotype-specific immunity. 


Yearf 

median Attack Ratio 

Q$ 

D 0 

1996 

0.39 (0.17-0.54) 

0.00171(0.0012-0.0038) 

1997 

0.87 (0.74-0.87) 

0.00273(0.0027-0.0032) 

1998 

0.5 (0.49-0.5) 

0.00142(0.0014-0.0014) 

1999 

0.11 (0.037-0.2) 

0.00345(0.0018-0.01) 

2000 

0.25 (0.24-0.27) 

0.0155(0.015-0.016) 

2001 

0.48 (0.47-0.49) 

0.0495(0.048-0.051) 

2005 

0.15 (0.1-0.21) 

0.0147(0.01-0.021) 

2006 

0.11 (0.08-0.14) 

0.0281(0.022-0.037) 

2007 

0.15 (0.15-0.15) 

0.135(0.13-0.14) 

2008 

0.14 (0.031-0.31) 

0.00672(0.003-0.024) 

2010 

0.18 (0.17-0.19) 

0.0454(0.043-0.048) 

2011 

0.086 (0.082-0.094) 

0.215(0.2-0.23) 

2012 

0.14 (0.13-0.15) 

0.0621(0.058-0.068) 
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Appendix 

A remark on prior distributions and tail behaviour of the distribution of R t 

There are a number of approaches to deriving the distribution of R t . Alternatively to the approach 
described in the main text 14 , one could use the conditional distribution of R t on Y t+ 1 and Y t as defined 
in equation A7 of Nishiura et al. [II]: 


f R (R t ) = (Y t R t ) Yt+1 e 


Y t Rt 


(14) 


Noticing the kernel of (141 is that of a gamma distribution with a 2 = Ft+i + 1 and b 2 = Y t , we obtain a 
proper density from which to construct c a (Rt), simply by computing the appropriate quantiles of said 
distribution. This density is 

bn 2 


f N (Rt\a 2 ,b 2 ) = -^-R?- le ~ b2R t 
r (a 2 ) 


(15) 
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In order to decide which approach to take, it may be of use analysing the tail behaviour of the derived 
distributions for R t . Consider the case of using a flat Uniform(0, 1) prior for 9 t . With a 0 = b 0 = 1, 
ai = 02 and b\ = &2 + 1- The beta prime (inverse beta distribution) will have heavier tails compared to 
the conditional distribution proposed by 11 j, thus providing more conservative confidence/credibility 
intervals. To see that one needs simply take the ratio of the Beta prime and Gamma (unnormalized) 
densities and evaluate the limit as Rt goes to infinity: 


lim 

Rt—to o 


/ P (i?. t |oi,bi) 
fN (Rt | ^2 7 b‘2 ) 


lim 

Rt —yoo 


e YtRt 

(1 + R t ) Yt+Yt + 1+2 


= oo 


(16) 


Finally, note that we deliberately construct c a {R t ) as a equal-tailed 100a% credible set, rather than a less 
conservative highest posterior density (HPD) interval. 

As a side note, the Bayesian approach presented in this paper will give similar results to orthodox 
confidence intervals 15 and 16 for Y t+ i and Y t >> 1. Under the flat uniform prior for 9 t , the Bayesian 


posterior credibility interval is nearly indistinguishable from the confidence interval proposed by Clopper 
& Pearson (1931) [l6] for Y t +i,Y t > 20. Note also that the uniform prior (Beta{ 1,1)) for 9 t constitutes a 
poor prior choice mainly because the induced distribution for R t is only well-defined for bo > 2. 

An advantage of the Bayesian approach is that one can devise prior distributions for 9 t taking advantage 
of the intuitive parametrization and flexibility of the beta family of distributions. Prior elicitation can 
also be done for R t and the hyper-parameters directly plugged into the prior for 9 t . One can, for example, 
choose prior mean and variance for R t and find a o and bo that satisfy those conditions. Let mo and vq be 
the prior expectation and variance for R t . After some tedious algebra one finds 


a 0 = 


bo — 


m 0 vo +mo + rrio 
vo 

2v 0 + ml + m 0 
vo 


(17) 

(18) 


If one wants only to specify mo and the coefficient of variation c = yjvo/mo for R t a priori , some less 
boring algebra gives: 


a 0 = 

?71qC 2 +rriQ + ml 

(19) 

mgC 2 

bo = 

2mgC 2 + to 2 + m 

(20) 

m.gC 2 


This approach thus makes it possible to incorporate epidemiological knowledge about disease Biology 
(e.g. the magnitude of Ro) into the computation of Rt . This may prove particularly important when 
disease counts are low and/or close to the detection threshold. We provide an R script to perform the 
above elicitation at https://github.eom/fccoelho/paperLMl/blob/master/R/elicit_Rt_prior.R. 










